Space-efficient K-MER algorithm for generalized suffix tree
نویسندگان
چکیده
Suffix trees have emerged to be very fast for pattern searching yielding O (m) time, where m is the pattern size. Unfortunately their high memory requirements make it impractical to work with huge amounts of data. We present a memory efficient algorithm of a generalized suffix tree which reduces the space size by a factor of 10 when the size of the pattern is known beforehand. Experiments on the chromosomes and Pizza&Chili corpus show significant advantages of our algorithm over standard linear time suffix tree construction in terms of memory usage for pattern searching.
منابع مشابه
Space-efficient K-mer Algorithm for Generalised Suffix Tree
Suffix trees have emerged to be very fast for pattern searching yielding O (m) time, where m is the pattern size. Unfortunately their high memory requirements make it impractical to work with huge amounts of data. We present a memory efficient algorithm of a generalized suffix tree which reduces the space size by a factor of 10 when the size of the pattern is known beforehand. Experiments on th...
متن کاملSuffix Tree of Alignment: An Efficient Index for Similar Data
We consider an index data structure for similar strings. The generalized suffix tree can be a solution for this. The generalized suffix tree of two strings A and B is a compacted trie representing all suffixes in A and B. It has |A|+ |B| leaves and can be constructed in O(|A|+ |B|) time. However, if the two strings are similar, the generalized suffix tree is not efficient because it does not ex...
متن کاملSpace-efficient Data Structures for String Searching and Retrieval
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 The Models of Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Our Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...
متن کاملSuffix Array of Alignment: A Practical Index for Similar Data
The suffix tree of alignment is an index data structure for similar strings. Given an alignment of similar strings, it stores all suffixes of the alignment, called alignment-suffixes. An alignment-suffix represents one suffix of a string or suffixes of multiple strings starting at the same position in the alignment. The suffix tree of alignment makes good use of similarity in strings theoretica...
متن کاملUsing the Sadakane Compressed Suffix Tree to Solve the All-Pairs Suffix-Prefix Problem
The all-pairs suffix-prefix matching problem is a basic problem in string processing. It has an application in the de novo genome assembly task, which is one of the major bioinformatics problems. Due to the large size of the input data, it is crucial to use fast and space efficient solutions. In this paper, we present a space-economical solution to this problem using the generalized Sadakane co...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1703.02224 شماره
صفحات -
تاریخ انتشار 2017